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Abstract 

Transactional memory (TM) allows concurrent processes to organize se¬ 
quences of operations on shared data items into atomic transactions. A trans¬ 
action may commit, in which case it appears to have executed sequentially or 
it may abort, in which case no data item is updated. 

The TM programming paradigm emerged as an alternative to conven¬ 
tional fine-grained locking techniques, offering ease of programming and 
compositionality. Though typically themselves implemented using locks, 
TMs hide the inherent issues of lock-based synchronization behind a nice 
transactional programming interface. 

In this paper, we explore inherent time and space complexity of lock- 
based TMs, with a focus of the most popular class of progressive lock-based 
TMs. We derive that a progressive TM might enforce a read-only transaction 
to perform a quadratic (in the number of the data items it reads) number 
of steps and access a linear number of distinct memory locations, closing 
the question of inherent cost of read validation in TMs. We then show that 
the total number of remote memory references (RMRs) that take place in 
an execution of a progressive TM in which n concurrent processes perform 
transactions on a single data item might reach f2(nlogn), which appears to 
be the first RMR complexity lower bound for transactional memory. 


1 Introduction 

Transactional memory (TM) allows concurrent processes to organize sequences 
of operations on shared data items into atomic transactions. A transaction may 
commit, in which case it appears to have executed sequentially or it may abort, in 
which case no data item is updated. The user can therefore design software having 
only sequential semantics in mind and let the TM take care of handling conflicts 
(concurrent reading and writing to the same data item) resulting from concurrent 
executions. Another benefit of transactional memory over conventional lock-based 
concurrent programming is compositionality. it allows the programmer to easily 
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compose multiple operations on multiple objects into atomic units, which is very 
hard to achieve using locks directly. Therefore, while still typically implemented 
using locks, TMs hide the inherent issues of lock-based programming behind an 
easy-to-use and compositional transactional interface. 

At a high level, a TM implementation must ensure that transactions are consis¬ 
tent with some sequential execution. A natural consistency criterion is strict seri- 
alizability m- all committed transactions appear to execute sequentially in some 
total order respecting the timing of non-overlapping transactions. The stronger 
criterion of opacity \ 141, guarantees that every transaction (including aborted and 
incomplete ones) observes a view that is consistent with the same sequential ex¬ 
ecution, which implies that no transaction would expose a pathological behavior, 
not predicted by the sequential program, such as division-by-zero or infinite loop. 

Notice that a TM implementation in which every transaction is aborted is triv¬ 
ially opaque, but not very useful. Hence, the TM must satisfy some progress guar¬ 
antee specifying the conditions under which a transaction is allowed to abort. It 
is typically expected that a transaction aborts only because of data conflicts with 
a concurrent one, e.g., when they are both trying to access the same data item and 
at least one of the transactions is trying to update it. This progress guarantee, cap¬ 
tured formally by the criterion of progressiveness |13|, is satisfied by mosf TM 
implemenfafions today |[6|[7||T^. 

There are fwo design principles which slale-of-lhe-arl TM |[6|-|^ T^|2^ im- 
plemenfafions adhere fo: read invisibility (4113 and disjoint-access parallelism (^ 
ED- Bofh are assumed fo decrease fhe chances of a fransacfion fo encounfer a dafa 
conflicf and, fhus, improve performance of progressive TMs. Infuifively, reads per¬ 
formed by a TM are invisible if fhey do nof modify fhe shared memory used by fhe 
TM implemenfafion and, fhus, do nof affecf ofher fransacfions. A disjoinf-access 
parallel (DAP) TM ensures fhaf fransacfion accessing disjoinf dafa sefs do nof con- 
fend on fhe shared memory and, fhus, may proceed independenfly. As was earlier 
observed 1141, fhe combination of fhese principles incurs some inherenf cosfs, and 
fhe main motivation of fhis paper is fo explore fhese cosfs. 

Infuifively, fhe overhead invisible read may incur comes from fhe need of val¬ 
idation, i.e., ensuring fhaf read dafa ifems have nof been updafed when fhe frans¬ 
acfion completes. Our firsl resulf (Section is fhaf a read-only fransacfion in an 
opaque TM feafured wifh weak DAP and weak invisible reads musf incrementally 
validafe every nexf read operafion. This resulfs in a quadrafic (in fhe size of fhe 
fransacfion’s read sef) sfep-complexify lower bound. Informally, weak DAP means 
fhaf fwo fransacfions encounfer a memory race only if fheir dafa sefs are connecfed 
in fhe conflict graph, capfuring dafa-sef overlaps among all concurrenf fransacfions. 
Weak read invisibilify allows read operations of a fransacfion T fo be “visible” only 
if T is concurrenf wifh anofher fransacfion. The lower bound is derived for mini¬ 
mal progressiveness, where fransacfions are guaranfeed fo commit only if they run 
sequentially. Our result improves the lower bound | 141 derived for strict-data 

partitioning (a very strong version of DAP) and (strong) invisible reads. 

Our second result is that, under weak DAP and weak read invisibility, a strictly 
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serializable TM must have a read-only transaction that accesses a linear (in the size 
of the transaction’s read set) number of distinct memory locations in the course of 
performing its last read operation. Naturally, this space lower bound also applies 
to opaque TMs. 

We then turn our focus to strongly progressive TMs p4| that, in addition to pro¬ 
gressiveness, ensures that not all concurrent transactions conflicting over a single 
data item abort. In Section]^ we prove that in any strongly progressive strictly seri¬ 
alizable TM implementation that accesses the shared memory with read, write and 
conditional primitives, such as compare-and-swap and load-linked/store-conditional, 
the total number of remote memory references (RMRs) that take place in an exe¬ 
cution of a progressive TM in which n concurrent processes perform transactions 
on a single data item might reach Q.{nlogn). The result is obtained via a reduction 
to an analogous lower bound for mutual exclusion Q. In the reduction, we show 
that any TM with the above properties can be used to implement a deadlock-free 
mutual exclusion, employing transactional operations on only one data item and 
incurring a constant RMR overhead. The lower bound applies to RMRs in both the 
cache-coherent (CC) and distributed shared memory (DSM) models, and it appears 
to be the first RMR complexity lower bound for transactional memory. 


2 Model 


TM interface. A transactional memory (in short, TM) supports transactions for 
reading and writing on a finite set of data items, referred to as t-objects. Every 
transaction Tj, has a unique identifier k. We assume no bound on the size of a 
t-object, i.e., the cardinality on the set V of possible different values a t-object 
can have. A transaction T/^ may contain the following t-operations, each being a 
matching pair of an invocation and a response: readk{X) returns a value in some 
domain V (denoted readk{X) —)■ v) or a special value ^ V {abort)-, writek{X,v), 
for a value v £V, returns ok or A^-, tryC^. returns Ck {commit) or A^. 


Implementations. We assume an asynchronous shared-memory system in which 
a set of n > 1 processes ,..., communicate by applying operations on shared 
objects. An object is an instance of an abstract data type which specifies a set of 
operations that provide the only means to manipulate the object. An implementa¬ 
tion of an object type T provides a specific data-representation of T by applying 
primitives on shared base objects, each of which is assigned an initial value and a 
set of algorithms A (t), ... one for each process. We assume that these prim¬ 

itives are deterministic. Specifically, a TM implementation provides processes with 
algorithms for implementing readt, writer and tryCj^{) of a transaction Tj. by ap¬ 
plying primitives from a set of shared base objects. We assume that processes issue 
transactions sequentially, i.e., a process starts a new transaction only after the previ¬ 
ous transaction is committed or aborted. A primitive is a generic read-modify-write 
{RMW) procedure applied to a base object 110 151. It is characterized by a pair of 
functions {g,h): given the current state of the base object, g is an update function 


3 





that computes its state after the primitive is applied, while /j is a response function 
that specifies the outcome of the primitive returned to the process. A RMW primi¬ 
tive is trivial if it never changes the value of the base object to which it is applied. 
Otherwise, it is nontrivial. An RMW primitive {g, h) is conditional if there exists 
V, w such that g(v,w) = v and there exists v, w such that g{v,w) 7 ^ v O- For e.g, 
compare-and-swap (CAS) and load-linked/store-conditional (LL/SC are nontrivial 
conditional RMW primitives while fetch-and-add is an example of a nontrivial 
RMW primitive that is not conditional. 

Executions and configurations. An event of a process pi (sometimes we say 
step of Pi) is an invocation or response of an operation performed by p, or a rmw 
primitive {g,h) applied by p,- to a base object b along with its response r (we call 
it a rmw event and write (b,{g,h),r,i)). A configuration specifies fhe value of 
each base objecf and fhe sfafe of each process. The initial configuration is fhe 
configurafion in which all base objecfs have fheir initial values and all processes 
are in their initial states. 

An execution fragment is a (finite or infinite) sequence of events. An execution 
of an implementation I is an execution fragment where, starting from the initial 
configuration, each event is issued according to I and each response of a rmw 
event (b, {g,h),r,i) matches the state of b resulting from all preceding events. An 
execution E ■ E', denoting the concatenation of E and E', is an extension of E and 
we say that E' extends E. 

Let E be an execution fragment. For every transaction identifier k,E\k denofes 
fhe subsequence of E resfricfed fo evenfs of fransacfion T^. If is non-empfy, 
we say thaf participates in E, else we say E is T^-free. Two executions E and 
E' are indistinguishable to a set ^ of transactions, if for each transaction G 
E\k = E'\k. A TM history is the subsequence of an execution consisting of the 
invocation and response events of t-operations. 

The read set (resp., the write set) of a transaction in an execution E, denoted 
Rset(Tifj (and resp. Wset(Tk)), is the set of t-objects on which invokes reads 
(and resp. writes) in E. The data set of is Dset{Ti^) = Rset{Tf) U Wset{Tfj. 
A transaction is called read-only if Wset(Tk) = 0; write-only if Rset{Tk) = 0 and 
updating if Wset{Tf) 7 ^ 0. Note that, in our TM model, the data set of a transaction 
is not known apriori, i.e., at the start of the transaction and it is identifiable only 
by the set of data items the transaction has invoked a read or write on in the given 
execution. 

Transaction orders. Let txns{E) denote the set of transactions that participate in 
E. An execution E is sequential if every invocation of a t-operation is either the 
last event in the history H exported by E or is immediately followed by a matching 
response. We assume that executions are well-formed', no process invokes a new 
operation before the previous operation returns. Specifically, we assume fhaf for 
all r^, E\k begins wifh fhe invocation of a f-operafion, is sequential and has no 
evenfs affer Ajt or Q. A fransacfion G txns(E) is complete in E if E\k ends wifh 
a response evenf. The execufion E is complete if all fransacfions in txns(E) are 
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complete in E. A transaction Tj, G txns{E) is t-complete if E\k ends with Aj(, or Q; 
otherwise, is t-incomplete. is committed (resp., aborted) in E if the last event 
of is Cjt (resp., A^). The execution E is t-complete if all transactions in txns{E) 
are t-complete. 

For transactions {T^, G txns{E), we say that Tj, precedes in the real-time 
order of E, denoted Tj, T^, if 71 is t-complete in E and the last event of Tk 

precedes the first event of in E. If neither Tk -<Y 71, nor Tk, then Tk and 

Tm are concurrent in E. An execution E is t-sequential if there are no concurrent 
transactions in E. 

Contention. We say that a configuration C after an execution E is quiescent 
(and resp. t-quiescent) if every transaction Tk G txns{E) is complete (and resp. 
t-complete) in C. If a transaction T is incomplete in an execution E, it has exactly 
one enabled event, which is the next event the transaction will perform according 
to the TM implementation. Events e and e' of an execution E contend on a base 
object b if they are both events on Z; in E and at least one of them is nontrivial (the 
event is trivial (and resp. nontrivial) if it is the application of a trivial (and resp. 
nontrivial) primitive). We say that a transaction T is poised to apply an event e 
after E if e is the next enabled event for T mE. We say that transactions T and T' 
concurrently contend on b in E if they are each poised to apply contending events 
on b after E. 

We say that an execution fragment E is step contention-free for t-operation opk 
if the events of E\opk are contiguous in E. We say that an execution fragment E is 
step contention-free for Tk if the events oiE\k are contiguous in E. We say that E is 
step contention-free if E is step contention-free for all transactions that participate 
mE. 


3 TM classes 

TM-correctness. We say that readk{X) is legal in a t-sequential execution E if it 
returns the latest written value of X, and E is legal if every readk{X) in H that does 
not return Ak is legal in E. 

A finite history H is opaque if there is a legal t-complete t-sequential history 
S, such that (1) for any two transactions Tk,Tm G txns{H), if Tk Tm, then Tk 
precedes Tm in S, and (2) S is equivalent to a completion of El. 

A finite history 7/ is strictly serializable if there is a legal t-complete t-sequential 
history S, such that (1) for any two transactions Tk, Tm G txns{H), if Tk Tm, then 
Tk precedes Tm in S, and (2) S is equivalent to cseq{H), where H is some comple¬ 
tion of El and cseq{H) is the subsequence of H reduced to committed transactions 
in H. 

We refer to S as an opaque (and resp. strictly serializable) serialization of H. 
TM-liveness. We say that a TM implementation M provides interval-contention 
free (ICE) TM-liveness if for every finite execution E of M such that the configu¬ 
ration after E is quiescent, and every transaction Tk that applies the invocation of 
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a t-operation opk immediately after E, the finite step contention-free extension for 
opk contains a matching response. 

TM-progress. We say that a TM implementation provides sequential TM-progress 
(also called minimal progressiveness | |T4| ) if every transaction running step contention- 
free from a t-quiescent configuration commits within a finite number of steps. 

We say that transactions 7], Ty conflict in an execution £" on a t-object X if 
X £ Dset{Ti)r\Dset{Tj), andX G Wset{Ti)UWset{Tj). 

A TM implementation M provides progressive TM-progress (or progressive¬ 
ness) if for every execution E of M and every transaction £ txns{E) that returns 
A, in E, there exists a transaction £ txns{E) such that and are concurrent 
and conflict in E Ol- 

Let CObjii{Ti) denote the set of t-objects over which transaction £ txns{H) 
conflicts with any other transaction in history H, i.e.,X £ CObjniTi), ijf there exist 
transactions 7] and that conflict on X in El. Let Q C txns{H) and CObjniQ) = 
u CObjniTi). 

T,€Q 

Let CTrans{H) denote the set of non-empty subsets of txns{H) such that a set 
Q is in CTrans{H) if no transaction in Q conflicts with a transaction not in Q. 

Definition 1. A TM implementation M is strongly progressive ifM is weakly pro¬ 
gressive and for every history El of M and for every set Q £ CTrans{H) such that 
\CObjH{Q.) \ < L some transaction in Q is not aborted in 77. 


Invisible reads. A TM implementation M uses invisible reads if for every execu¬ 
tion E of M and for every read-only transaction £ txns{E), E\k does not contain 
any nontrivial events. 

In this paper, we introduce a definition of weak invisible reads. For any execu¬ 
tion E and any t-operation invoked by some transaction Tj, £ txns{E), let E\nk 
denote the subsequence of E restricted to events of Tit in E. 

We say that a TM implementation M satisfies weak invisible reads if for any 
execution E of M and every transaction T^- £ txns{E)-, Rset{Tfj 0 that is not 
concurrent with any transaction £ txns{E), E\nk does not contain any nontrivial 
events, where is any t-read operation invoked by in E. 


Disjoint-access parallelism (DAP). Let Te(7)-, Tj) be the set of transactions (7) and 
Tj included) that are concurrent to at least one of f and Tj in E. Let G{Ti, Tj,E) be 
an undirected graph whose vertex set is IJ Dset{T) and there is an edge be- 

TeiEiTijj) 


tween t-objects X and Y ijf there exists T £ ZE{Ti,Tj) such that {77, F} G Dset{T). 
We say that Tj- and Tj are disjoint-access in E if there is no path between a t-object 
in Dset{Ti) and a t-object in Dset{Tj) in G{Ti,Tj,E). A TM implementation M is 
weak disjoint-access parallel (weak DAP) if, for all executions E of M, transac¬ 
tions Ti and Tj concurrently contend on the same base object in E only if 7j and Tj 
are not disjoint-access inE or there exists at-objectA £ Dset{Ti)r\Dset{Tj) |5p2|. 
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i — 1 t-reads 


Rip {Xi) —^ nv 


Wi{Xi,nv) 

Ti I- 

Ti commits 


Ti, I 




(a) (Xj) must return nv by strict serializability 


t 


Til \— 




R^iXi} —> nv 
new value 


Wi(Xi,nv) 

Ti I-1 

T, commits 

(b) Tj does not observe any conflict with 

Figure 1: Executions in the proof of Lemma|^ By weak DAP, cannot distinguish 
this from the execution in Figure [Ta| 


Lemma 1. ( H^O^) Let M be any weak DAP TM implementation. Let a- p\-p 2 

be any execution ofM where pi (and resp. P 2 ) is the step contention-free execution 
fragment of transaction Ti 0 txns(a) (and resp. T 2 0 txns{ot)) and transactions T\, 
T 2 are disjoint-access in Ot ■ pi ■ P 2 . Then, Ty and T 2 do not contend on any base 
object in ot- p\- p 2 . 


4 Time and space complexity of sequential TMs 

In this section, we prove that (1) that a read-only transaction in an opaque TM fea¬ 
tured with weak DAP and weak invisible reads must incrementally validate every 
next read operation, and (2) a strictly serializable TM (under weak DAP and weak 
read invisibility), must have a read-only transaction that accesses a linear (in the 
size of the transaction’s read set) number of distinct base objects in the course of 
performing its last t-read and try Commit operations. 

We first prove the following lemma concerning strictly serializable weak DAP 
TM implementations. 

Lemma 2. Let M be any strictly serializable, weak DAP TM implementation that 
provides sequential TM-progress. Then, for all / G N, M has an execution of the 
form ■ p‘ ■a‘ where, 

• is the complete step contention-free execution of read-only transaction 
Tfj, that performs (/ — 1 ) t-reads: read,p (Xi) • • • readp (A,_ 1 ), 

• p^ is the t-complete step contention-free execution of a transaction T that 
writes nvf 7 ^ v,- to Xi and commits, 

• Oti is the complete step contention-free execution fragment of T^ that per¬ 
forms its t-read: read^{Xi) —)• nv,-. 
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Proof. By sequential TM-progress, M has an execution of the form p* • ^. Since 

Dset{Tk) r\Dset{Ti) = 0 in p' • by Lemmaj^ transactions and f do not con¬ 
tend on any base object in execution p‘ ■ Thus, p' • is also an execution 
ofM. 

By assumption of strict serializability, p' • • a, is an execution of M in 

which the t-read of X, performed by must return nv,-. But p‘ ■ ■ at is in¬ 

distinguishable to Tij) from ■ p' ■ a,. Thus, M has an execution of the form 
n‘^^-p'-ai. □ 

Theorem 3. For every weak DAP TM implementation M that provides ICF TM- 
liveness, sequential TM-progress and uses weak invisible reads, 

(1) IfM is opaque, for every m G N, there exists an execution E ofM such that 
some transaction T G txns{E) performs D.{m^) steps, where m = \Rset{Tk)\. 

(2) if M is strictly serializable, for every m G N, there exists an execution E of 
M such that some transaction G txns{E) accesses at least m—\ distinct 
base objects during the executions of the mj^ t-read operation and tryC),(), 
where m = \Rset{Tk)\. 

Proof For all / G {1,... ,m}, let v be the initial value of t-object Xj. 

(1) Suppose that M is opaque. Let tt"® denote the complete step contention- 
free execution of a transaction that performs m t-reads: read^ (Xi) • • • {X,„) 

such that for all / G {1,..., m}, read^ (X,) —)• v. 

By Lemma for all / G {2,...,/«}, M has an execution of the form E' = 
-p' -a,-. 

For each / G {2,... ,m}, j G {1,2} and l< (/ — 1), we now define an execution 
of the form • p' • a} as follows: 

• j 8 ^ is the t-complete step contention-free execution fragment of a transaction 
Ti that writes nvi 7 ^ v to and commits 

• a\ (and resp. af) is the complete step contention-free execution fragment of 
readij, (X;) —)• v (and resp. read^ (X,) —^ A^). 

Claim 4. Eor all i G {2,... ,m} and (.< (/ — 1), M has an execution of the form Ej^ 
or E 2 £. 

Proof For all / G {2,... ,m}, is an execution of M. By assumption of weak 
invisible reads and sequential TM-progress, Ti must be committed in • p^ and 
M has an execution of the form • j 8 ^. By the same reasoning, since f and 
have disjoint data sets, M has an execution of the form • p'. 

Since the configuration after • p' is quiescent, by ICF TM-liveness, 

^ • j 8 ^ • p* extended with read(j) (X ) must return a matching response. If read(j) (X,j —)• 
V,-, then clearly Ej is an execution of M with 1 , f being a valid serialization 

of transactions. If read(j,{Xi) A^, the same serialization justifies an opaque exe¬ 
cution. 


Suppose by contradiction that there exists an execution of M such that • 
j8^ • p* is extended with the complete execution of (X;) —)■ r; r 0 {A^, v}. The 
only plausible case to analyse is when r = nv. Since read^{Xi) returns the value 
of Xi updated by 7], the only possible serialization for transactions is Ti, Ti, T^-, 
but read^{Xi) performed by that returns the initial value v is not legal in this 
serialization—contradiction. □ 

We now prove that, for all i G {2,... ,m}, j G {1,2} and i < {i — 1), transaction Tip 
must access (/ — 1) different base objects during the execution of readp{Xi) in the 
execution ■ p‘ ■ a'-. 

By the assumption of weak invisible reads, the execution ■ p’ ■ a} is in¬ 
distinguishable to transactions Ti and Tj- from the execution • p' • a'-, where 

Rset{Tp) = 0 in But transactions Ti and Ti are disjoint-access in ■ p' 

and by Lemma [T] they cannot contend on the same base object in this execution. 

Consider the (/ — 1) different executions: • j3' • p', ..., • p'. For 

all £,£' < (/ — 1);/ ^ M has an execution of the form • p* • [5^' in which 

transactions Ti and Tf access mutually disjoint data sets. By weak invisible reads 
and Lemma [T] the pairs of transactions Tp, Ti and Tf, Tg do not contend on any 
base object in this execution. This implies that 7i‘^^ • j3^ • j8^ • p' is an execution 
of M in which transactions Ti and T^i each apply nontrivial primitives to mutually 
disjoint sets of base objects in the execution fragments and j8^ respectively (by 
Lemma [T]l. 

This implies that for any j G {1,2},^ < (/ — 1), the configuration C' after E' 
differs from the configurations after only in the states of the base objects that 
are accessed in the fragment j3^. Consequently, transaction Tp must access at least 
i — 1 different base objects in the execution fragment ti} to distinguish configuration 
C‘ from the configurations that result after the (/ — 1) different executions • j8 ^ • 
p‘, ..., • p' respectively. 

Thus, for all i G {2,.. .,m}, transaction Tp must perform at least i — 1 steps 

m—1 ffiim 11 

while executing the t-read in n’j and Tp itself must perform ^ i = ^ steps. 

(2) Suppose that M is strictly serializable, but not opaque. Since M is strictly 
serializable, by Lemma|^ it has an execution of the form E = ■ p™ • a^- 

For each (’<(/—!), we prove that M has an execution of the form Ei = ■ 

• p"* • a'” where a™ is the complete step contention-free execution fragment of 
readp{X„,) followed by the complete execution of tryCp. Indeed, by weak invisible 
reads, does not contain any nontrivial events and the execution • j8^ •p'” 
is indistinguishable to transactions Ti and T^ from the executions and 

.p® respectively, where Rset{Tp) = 0 in Thus, applying LemmaRl 

transactions j8^ • p™ do not contend on any base object in the execution • j3 • 
p™. By ICF TM-liveness, readp{X,„) and tryCp must return matching responses 
in the execution fragment a'” that extends • p™. Consequently, for each 

£ < {i — \), M has an execution of the form E^ = • p™ • a"” such that 
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Algorithm 1 Mutual-exclusion object L from a strongly progressive, strict serial¬ 
izable TM M; code for process p,; 1 < / < n 

1: 

Local variables: 

19: 

Entry: 

2: 

bit/ace,-, for each process p; 

20: 

facCj := 1 —facei 



21: 

Done[pj,facej].\Nnte(false) 

3: 

Shared objects: 

22: 

Smcc[p,',/ ace,]. write(T) 

4: 

strongly progressive, strictly 

23: 

while (prev t— func) = false do 

5: 

serializable TM M 

24: 

no op 

6: 

t-object X, initially T 

25: 

end while 

7: 

storing value v e f [p/,face,H U {_L} 

26: 

if prev f A then 

8: 

for each tuple [p,',/ace,] 

27: 

Lock[pi] \prev.pid] .\Nr\te{locked) 

9: 

Done[pi,facej] G {true,false} 

28: 

Succ \prev] . write (p;) 

10: 

Succ[pi,facei] G {pi,...,p„}U{T} 

29: 

if Done\prev] = false then 

11: 

for each p; and y G {1\ 

30: 

while Lock\pi]\prev.pid\ = unlocked 

12: 

Lock[pi][pj] G {locked,unlocked} 


do 



31; 

no op 



32; 

end while 



33: 

Return ok 

13: 

Function: /mocQ: 



14: 

atomic using M 


// Ciiliuctl ut^uLiuii 

15: 

value := tx-readfX") 



16: 

17: 

tx-writefilL, [p,',/ace;]) 

on abort Return false 

35: 

36: 

Exit: 

Done[pi,face f\.\Nr\te{true) 

18- 

Return value 

37: 

Lock[Succ[pi,facejf\[pi\.\Nr\te{unlocked) 



38: 

Return ok 


transactions and T,n do not contend on any base object. 

Strict serializability of M means that if read^ {X,n) — )• nv in the execution frag¬ 
ment a™, then tryC^ must return A^. Otherwise if read^{Xm) —)• v (i.e. the initial 
value of Xm), then tryC^ may return or C^. 

Thus, as with (1), in the worst case, Tip must access at least m — 1 distinct base 
objects during the executions of readp{X,n) and tryCp to distinguish the configura¬ 
tion C‘ from the configurations after the m—l different executions • j8^ • p™, 
..., • p'" respectively. □ 

5 RMR complexity of strongly progressive TMs 

In this section, we prove every strongly progressive strictly serializable TM that 
uses only read, write and conditional RMW primitives has an execution in which 
in which n concurrent processes perform transactions on a single data item and 
incur fl(logn) remote memory references |j^. 

Remote memory references(RMR) |j^. In the cache-coherent (CC) shared mem¬ 
ory, each process maintains local copies of shared objects inside its cache, whose 
consistency is ensured by a coherence protocol. Informally, we say that an access 
to a base object b is remote to a process p and causes a remote memory reference 
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(RMR) if p’s cache contains a cached copy of the object that is out of date or inval¬ 
idated', otherwise the access is local. 

In the write-through (CC) protocol, to read a base object b, process p must have 
a cached copy of b that has not been invalidated since its previous read. Otherwise, 
p incurs a RMR. To write to b, p causes a RMR that invalidates all cached copies 
of b and writes to the main memory. 

In the write-back (CC) protocol, p reads a base object b without causing a RMR 
if it holds a cached copy of b in shared or exclusive mode; otherwise the access of 
b causes a RMR that (1) invalidates all copies of b held in exclusive mode, and 
writing b back to the main memory, (2) creates a cached copy of b in shared mode. 
Process p can write to b without causing a RMR if it holds a copy of b in exclusive 
mode; otherwise p causes a RMR that invalidates all cached copies of b and creates 
a cached copy of b in exclusive mode. 

In the distributed shared memory (DSM), each register is forever assigned to a 
single process and it remote to the others. Any access of a remote register causes a 
RMR. 

Mutual exclusion. The mutex object supports two operations: Enter and Exit, both 
of which return the response ok. We say that a process p, is in the critical section 
after an execution n if 7i contains the invocation of Enter by pi that returns ok, but 
does not contain a subsequent invocation of Exit by p, in n. 

A mutual exclusion implementation satisfies the following properties: 

{Mutual-exclusion) After any execution n, there exists at most one process that 
is in the critical section. 

{Deadlock-freedom) Let n be any execution that contains the invocation of 
Enter by process p,-. Then, in every extension of n in which every process takes 
infinitely many steps, some process is in the critical section. 

{Einite-exit) Every process completes the Exit operation within a finite number 
of steps. 

5.1 Mutual exclusion from a strongly progressive TM 

We describe an implementation of a mutex object L{M) from a strictly serializable, 
strongly progressive TM implementation M (Algorithm [T]l. The algorithm is based 
on the mutex implementation in | [T7| . 

Given a sequential implementation, we use a TM to execute the sequential code 
in a concurrent environment by encapsulating each sequential operation within an 
atomic transaction that replaces each read and write of a t-object with the trans¬ 
actional read and write implementations, respectively. If the transaction commits, 
then the result of the operation is returned; otherwise if one of the transactional 
operations aborts. For instance, in Algorithm [T] we wish to atomically read a t- 
object X, write a new value to it and return the old value of X prior to this write. 
To achieve this, we employ a strictly serializable TM implementation M. More¬ 
over, we assume that M is strongly progressive, i.e., in every execution, at least one 
transaction successfully commits and the value of X is returned. 
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Shared objects. We associate each process pi with two alternating identities 
[pi,facej\\facei G {0,1}. The strongly progressive TM implementation M is used 
to enqueue processes that attempt to enter the critical section within a single t- 
object X (initially _L). For each [p;,/ace,], L{M) uses a register bit Done[pi,facei] 
that indicates if this face of the process has left the critical section or is executing 
the Entry operation. Additionally, we use a register Succ\pijace^ that stores the 
process expected to succeed p, in the critical section. If Succ\pi,facej] = pj, we 
say that p/ is the successor of pi (and p,- is the predecessor of pj). Intuitively, 
this means that pj is expected to enter the critical section immediately after p,. 
Finally, L{M) uses a 2-dimensional bit array Lock: for each process p,, there are 
n — l registers associated with the other processes. For all j G {0 ,...,n- 1}\{/}, 
the registers Lock[pi\ [pj] are local to p, and registers Lock[pj] [p,] are remote to p,-. 
Process p, can only access registers in the Lock array that are local or remote to it. 
Entry operation. A process p; adopts a new identity faccf and writes false to 
Done{pi,facei) to indicate that p, has started the Entry operation. Process p,- now 
initializes the successor of [pi,face{] by writing _L to Succlpijace,]. Now, p,- uses 
a strongly progressive TM implementation M to atomically store its pid and iden¬ 
tity i.e..,facei to t-object X and returns the pid and identity of its predecessor, say 
[pjjacej]. Intuitively, this suggests that \pi,facei\ is scheduled to enter the critical 
section immediately after ]pj,facej\ exits the critical section. Note that if p, reads 
the initial value of t-object X, then it immediately enters the critical section. Other¬ 
wise it writes locked to the register Lock[pi,pj\ and sets itself to be the successor of 
[pj,facej\ by writing p,- to Succ[pj,facej\. Process p,- now checks if pj has started 
the Exit operation by checking if Done[pj,face j] is set. If it is, p,- enters the critical 
section; otherwise p,- spins on the register Lock[pi] [pj] until it is unlocked. 

Exit operation. Process p, first indicates that it has exited the critical section by 
setting Done[piJacej\, following which it unlocks the. re.gis,i&rLock[Succ\pi,face^] [p,] 
to allow p, ’s successor to enter the critical section. 


5.2 Proof of correctness 


Lemma 5. The implementation L{M) (Algorithm^ satisfies mutual exclusion. 

Proof. Let E be any execution of L{M). We say that [p/,/ace,] is the successor of 
[pjjacej] if Pi reads the value of prev in Line ! 


25 to be [p j, face j] (and [pjjacej] is 
the predecessor of [p;,/ace,]); otherwise if p,- reads the value to be _L, we say that 
Pi has no predecessor. 

Suppose by contradiction that there exist processes p, and pj that are both 
inside the critical section after E. Since p, is inside the critical section, either (1) 


Pi read prev = _L in Line 23 or (2) p, read that Done\prev] is true (Line 


291 or Pi 


reads that Done\prev\ is false and Lock\pi\\prev .pid] is unlocked (Line 301. 


(Case 1) Suppose that p,- read prev = _L and entered the critical section. Since 
in this case, p,- does not have any predecessor, some other process that returns 


successfully from the while loop in Line 25 must be successor of p, in E. Since 
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there exists [pjjaccj] also inside the critical section after E, pj reads that either 
[pijaceil or some other process to be its predecessor. Observe that there must exist 
some such process [pi^,face]^\ whose predecessor is [p;,/ace,]. Hence, without loss 
of generality, we can assume that [pjjacej] is the successor of [p;,/ace,]. By our 
assumption, \pj,facej] is also inside the critical section. Thus, pj locked the regis¬ 
ter Lock\pj,pi] in Line 27 and set itself to be p,’s successor in Line 28 Then, pj 


in Line 

read that Done[pi,facef\ is true or read that Done[pi,facej\ h false and waited until 
Lock[pj,pi] is unlocked and then entered the critical section. But this is possible 
only if Pi has left the critical section and updated the registers Done [p,-,/ace,] and 
Lock[pj,pi\ in Lines and [^respectively— contradiction to the assumption that 
[p,,/ace,] is also inside the critical section after E. 

(Case 2) Suppose that p,- did not read prev = _L and entered the critical section. 
Thus, Pi read that Done [prev] is false in Line [2^ and Lock[pi\ \prev.pid] is unlocked 
in Line 30 where prev is the predecessor of [p,',/ace,]. As with case 1, without 


loss of generality, we can assume that \pj,facej] is the successor of [p;,/ace,] or 
[py ,/ace^] is the predecessor of [p,',/ace,]. 

Suppose that [pjjacej] is the predecessor of [p,-,/ace;], i.e., pi writes the value 
[p,,/ace,] to the register Succ[pj,facej] in Line 28 Since [py,/accy] is also inside 


the critical section after E, process p,- must read that Done[pj,face.] is true in 


Line 29 and Lock[pi,pj\ is locked in Line 30 But then p, could not have entered 
the critical section after E —contradiction. 

Suppose that [py,/acoy] is the successor of [p,-,/ace,], i.e., pj writes the value 
[pjjacej] to the register 5'Mcc[p,-,/ace,]. Since both p,- and py are inside the critical 
section after E, process py must read that Done[pi,facei\ is true in Line |2^ and 
Lock[pj,pi\ is locked in Line 30 Thus, p/ must spin on the register Lock\p j,pi\. 


waiting for it to be unlocked by p, before entering the critical section—contradiction 
to the assumption that both p, and py are inside the critical section. 

Thus, L{M) satisfies mutual-exclusion. □ 

Lemma 6. The implementation L{M) (Algorithm^ provides deadlock-freedom. 

Proof. Let E be any execution of L{M). Observe that a process may be stuck 
indefinitely only in Lines |2^ and [30] as it performs the while loop. 

Since M is strongly progressive, in every execution E that contains an invoca¬ 
tion of Enter by process p,-, some process returns true from the invocation offunc{) 
in Linel^ 

Now consider a process pi that returns successfuly from the while loop in 
Line [2^ Suppose that p, is stuck indefinitely as it performs the while loop in 
LinepM Thus, no process has unlocked the register Lock[pi]\prev.pid] by writing 
to it in the Exit section. Recall that since [pijace^ has reached the while loop in 
Line 30 [p,-,/ace,] necessarily has a predecessor, say [py,/acey], and has set itself 
to be py’s successor by writing p,- to register Succ[pj,facej] in Line 28 Consider 
the possible two cases: the predecessor of [p j,facej is some process pf\k / i or the 
predecessor of [pjjacej is the process p,- itself. 
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(Case 1) Since by assumption, process pj takes infinitely many steps in E, the 
only reason that pj is stuck without entering the critical section is that [pkjacej^] is 
also stuck in the while loop in Line|^ Note that it is possible for us to iteratively 
extend this execution in which pit’s predecessor is a process that is not p,- or pj 
that is also stuck in the while loop in Line But then the last such process 
must eventually read the corresponding Lock to be unlocked and enter the critical 
section. Thus, in every extension of E in which every process takes infinitely many 
steps, some process will enter the critical section. 

(Case 2) Suppose that the predecessor of [pj,facej is the process p,- itself. 
Thus, as [p,,/acc] is stuck in the while loop waiting for Lock[pi,pj] to be unlocked 


by process pj, pj leaves the critical section, unlocks Lock[pi,pj] in Line 37 and 
prior to the read of Lock[pi,pj], pj re-starts the Entry operation, writes false to 
Done[pj,l —facej] and sets itself to be the successor of [p,-,/ace;] and spins on 
the register Lock[pj,pi\. However, observe that process p/, which takes infinitely 
many steps by our assumption must eventually read that Lock[pi,pj] is unlocked 
and enter the critical section, thus establishing deadlock-freedom. □ 


We say that a TM implementation M accesses a single t-object if in every 
execution E of M and every transaction T G txns{E), \Dset{T)\ < 1. We can now 
prove the following theorem: 


Theorem 7. Any strictly serializable, strongly progressive TM implementation M 
that accesses a single t-object implies a deadlock-free, finite exit mutual exclusion 
implementation L[M) such that the RMR complexity of M is within a constant 
factor of the RMR complexity ofL{M). 

Proof (Mutual-exclusion) Follows from Lemma 

(Finite-exit) The proof is immediate since the Exit operation contains no un¬ 
bounded loops or waiting statements. 

(Deadlock-freedom) Follows from Lemma 

(RMR complexity) First, let us consider the CC model. Observe that every 
event not on M performed by a process p, as it performs the Entry or Exit oper¬ 
ations incurs 0(1) RMR cost clearly, possibly barring the while loop executed in 
Line During the execution of this while loop, process p, spins on the regis¬ 
ter Lock[pi]\pj], where pj is the predecessor of p;. Observe that p,’s cached copy 
of Lock[pi]\pj\ may be invalidated only by process pj as it unlocks the register 
in Line Since no other process may write to this register and p, terminates 
the while loop immediately after the write to Lock\pi]\pj\ by pj, p, incurs 0(1) 
RMR’s. Thus, the overall RMR cost incurred by M is within a constant factor of 
the RMR cost of L{M). 

Now we consider the DSM model. As with the reasoning for the CC model, 
every event not on M performed by a process p, as it performs the Entry or Exit 
operations incurs 0(1) RMR cost clearly, possibly barring the while loop executed 
in Line|^ During the execution of this while loop, process p, spins on the register 
Lock[pi] [py], where p/ is the predecessor of p,-. Recall that Lock\pi] [pf is a register 
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that is local to pi and thus, p,- does not incur any RMR cost on account of executing 
this loop. It follows that p, incurs 0(1) RMR cost in the DSM model. Thus, the 
overall RMR cost of M is within a constant factor of the RMR cost of L{M) in the 
DSM model. □ 

Theorem 8. ( Any deadlock-free, finite-exit mutual exclusion implementation 
from read, write and conditional primitives has an execution whose RMR complex¬ 
ity is Q.{n\ogn). 

Theorems and imply: 

Theorem 9. Any strictly serializable, strongly progressive TM implementation 
from read, write and conditional primitives that accesses a single t-object has an 
execution whose RMR complexity is D.{nlogn). 


6 Related work and concluding remarks 


Theorem [^improves the read-validation step-complexity lower bound de¬ 

rived for strict-data partitioning (a very strong version of DAP) and (strong) invis¬ 
ible reads. In a strict data partitioned TM, the set of base objects used by the TM 
is split into disjoint sets, each storing information only about a single data item. In¬ 
deed, every TM implementation that is strict data-partitioned satisfies weak DAP, 


but not vice-versa. The definition of invisible reads assumed in [13 141 requires 


that a t-read operation does not apply nontrivial events in any execution. Theo¬ 
rem [^however, assumes weak invisible reads, stipulating that t-read operations of 
a transaction T do not apply nontrivial events only when T is not concurrent with 
any other transaction. 

The notion of weak DAP used in this paper was introduced by Attiya et al. Q. 

Proving a lower bound for a concurrent object by reduction to a form of mutual 
exclusion has previously been used in |[^[^. Guerraoui and Kapalka | [T4| proved 
that it is impossible to implement strictly serializable strongly progressive TMs 
that provide wait-free TM-liveness (every t-operation returns a matching response 
within a finite number of steps) using only read and write primitives. Alistarh et al. 
proved a lower bound on RMR complexity of renaming problem 1{T| . Our reduction 
algorithm (Sectionis inspired by the 0(1) RMR mutual exclusion algorithm by 
Lee|[T7|. 

To the best of our knowledge, the TM properties assumed for Theoremcover 
all of the TM implementations that are subject to the validation step-complexity ||^ 

urn- 

It is easy to see that the lower bound of Theorem is tight for both strict seri- 
alizability and opacity. We refer to the TM implementation in | |T^ or DSTM 1161 
for the matching upper bound. 

Finally, we conjecture that the lower bound of Theorem|^is tight. Proving this 
remains an interesting open question. 
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